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The recent WMAP3 results have placed measurements of the spectral index ns in an interesting 
position. While parameter estimation techniques indicate that the Harrison-Zel'dovich spectrum 
ns = 1 is strongly excluded (in the absence of tensor perturbations), Bayesian model selection 
techniques reveal that the case against ns = 1 is not yet conclusive. In this paper, we forecast the 
ability of the Planck satellite mission to use Bayesian model selection to convincingly exclude (or 
favour) the Harrison-Zel'dovich model. 
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I. INTRODUCTION 



One of the key goals of cosmology is to probe the na- 
ture of the primordial perturbations, for instance to seek 
support for the inflationary cosmology. The simplest 
models of inflation predict adiabatic gaussian density 
perturbations of approximately power-law form, charac- 
terized by the spectral index ns, and in addition a spec- 
trum of gravitational wave perturbations (see Ref. [l| for 
an overview). 

The ability of experiments, actual or proposed, to ex- 
plore such questions is typically framed in terms of pa- 
rameter estimation, for instance by forecasting the ex- 
pected uncertainty on ns given a particular assumed fidu- 
cial model. However, it has been stressed in a number of 
papers recently 00. [HQ that many of the key questions 
are not ones of parameter estimation, but of model selec- 
tion |E 13 • Model selection problems are characterized 
by an uncertainty in the choice of parameters to vary in 
a fit to data, rather than of the values of a parameter set 
chosen by hand. The discovery of any new physical effect 
in data is indicated by the need to include new parame- 
ters, possible examples being non-zero spatial curvature, 
time variation of the dark energy density, or the existence 
of tensor perturbations. Early cosmological applications 
of this technique were given in Ref. |9( . 

Since many of the most important questions are ones 
of model selection rather than parameter estimation, it 
follows that the capabilities of experiments should also be 
quantified by model selection criteria rather than param- 
eter uncertainty forecasts alone. This is the viewpoint 
adopted in recent papers by Trotta |3|, whose Expected 
Posterior Odds (ExPO) forecasting technique estimates 
the probability of new data requiring new parameters, 
and by Mukherjee et al. who use Bayes factor plots to 
compare the ability of different experiments to decisively 
select between models. 

Mukherjee et al. 5| illustrated model selection fore- 
casting using dark energy surveys, looking at a two- 
parameter dark energy model versus a cosmological con- 
stant model. The same general approach is applicable in 
many other contexts. In this paper, we carry out model 
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selection forecasting for the Planck satellite cosmic mi- 
crowave background project, focussing on its ability to 
measure the spectral index ns . This is particularly ti mely 
as the recent release of the three- year WMAP data hjj 
has placed this parameter in the zone around three-sigma 
where the application of model selection techniques is at 
its most crucial 3j. In a companion paper to this one, 
Parkinson et al. [llj have shown that the case for ns ^ 1 
is far from decisive at present. 

We assume throughout that there are no tensor pertur- 
bations. While it would be interesting to explore models 
including tensors, thus properly probing the inflationary 
space, at present doing ns alone stretches our supercom- 
puter resources to their limit. In this regard model selec- 
tion forecasting is much more challenging than analysis 
of real data, as instead of having a single dataset to an- 
alyze, one has to create and analyze simulated datasets 
for a range of possible models and model parameters. 



II. MODEL SELECTION FORECASTS FOR n s 

A. Model selection forecasting 

The philosophical underpinning of model selection 
forecasting was described in Mukherjee et al. |£| and we 
summarize it only very briefly here. Given a particu- 
lar dataset, simulated or real, model selection is carried 
out by evaluation of a model selection statistic for each 
model, where the term model refers to a choice of pa- 
rameters to be varied plus a set of prior ranges for those 
parameters. The usual statistic of choice is the Bayesian 
evidence E, also known as the marginalized likelihood. 
The ratio of evidences between two models is known as 
the Bayes factor, B w = E(M 1 )/E(M ), where Mi and 
Mq indicate the two models under consideration. By 
plotting the Bayes factor using datasets generated as a 
function of a parameter of interest, one uncovers the re- 
gions of parameter space in which a given experiment 
would be able to decisively select between the two mod- 
els, and also those regions where the comparison would 
be inconclusive. 
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In short, the advantages of model selection over pa- 
rameter estimation forecasting are as follows |jjg. 

• Experiments motivated by model selection ques- 
tions should be quantified by their ability to answer 
such questions. 

• Data is simulated at each point in the parameter 
space, rather than at only one or more fiducial mod- 
els. Indeed in parameter estimation plots people 
commonly simulate data for the model that they 
hope to rule out, rather than for the true model 
that would allow that exclusion. 

• Model selection analyses can attribute positive sup- 
port for a simpler model, rather than only showing 
consistency. 

• Gaussian approximations to the likelihood are not 
made, such as in parameter estimation forecasting 
done using Fisher matrices. 

In assessing the significance of a model comparison, 
a useful guide is given by the Jeffreys' scale p. La- 
belling as Mi the model with the higher evidence, it 
rates In B\q < 1 as 'not worth more than a bare men- 
tion', 1 < Initio < 2.5 as 'substantial', 2.5 < ln£>io < 5 
'strong' to 'very strong' and 5 < In B w as 'decisive'. Note 
that In Bio = 5 corresponds to odds of 1 in about 150, 
and InSio = 2.5 to odds of 1 in 13. 

A model selection analysis of the Planck satellite's ca- 
pabilities to constrain ns was previously given by Trotta 
yj using his ExPO technique. This seeks to estimate the 
probability, based on current knowledge of parameters, 
of the Planck mission being able to carry out a decisive 
model comparison. Our aim is rather different; we seek to 
delineate the parameter values the Universe would have 
to have in order for a decisive model comparison to be 
made. However we will end by additionally making an 
ExPO-style forecast, though with a somewhat different 
implementation to Trotta's. 

Another related paper is Bridges et al. ^2], who sim- 
ulate data for a model with constant ns and compare 
the evidences for a set of initial power spectrum mod- 
els. They do not however explore different values of the 
spectral index. 

It may seem strange that model selection approaches 
can give results in apparent conflict with parameter es- 
timation. However, this is a well-known phenomenon 
called Lindley's paradox 0, ^|; the idea that there is 
a universal significance level such as 95% beyond which 
things become interesting is inconsistent with Bayesian 
reasoning, which shows that such a threshold should de- 
pend both on the data properties and the prior parameter 
ranges. The Lindley paradox usually manifests itself for 
results with significance in the range two to four sigma 
H, which as it happens is exactly where WMAP3 has 
placed ns- 



B. Simulating Planck data 

In order to give a good estimate of Planck's abilities, we 
need accurate data simulations. Simulated Planck data 
was generated by Bridges et al. 01 f° r their model selec- 
tion analysis, but they simply assumed cosmic variance 
limited temperature anisotropies out to I — 2000. 1 We 
adopt a rather more sophisticated approach, as follows. 

We simulate the temperature and polarization (TT, 
TE, and EE) spectra. We choose not to include B- 
polarization for simplicity; as we do not include tensors 
there are no primordial B modes, and the shorter-scale 
-B-modes generated by gravitational lensing will not sup- 
ply significant constraining power on the specific models 
we are considering. 

We use three temperature channels, of specifications 
similar to the HFI channels of frequency 100 GHz, 143 
GHz, and 217 GHz. Following the current Planck doc- 
umentation, 2 the intensity sensitivities of these channels 
are taken as 6.8 /iK, 6.0 fiK, and 13.1 /iK respectively, 
corresponding to the values quoted for two complete sky 
surveys. These are average sensitivities per pixel, where 
a pixel is a square whose side is the FWHM extent of 
the beam. The FWHM's of these channels are given 
as 9.5 arcmin, 7.1 arcmin, and 5.0 arcmin respectively. 
The composite noise spectrum for the three temperature 
channels is obtained by inverse variance weighting the 
noise of individual channels 0, ^| . For polarization we 
take only one channel, the 143 GHz channel, of FWHM 
7.1 arcmin, and sensitivity 11.5 /iK. 

The assumed Gaussianity of the spherical harmonic 
coefficients of the temperature and polarization leads to 
a likelihood function given by (see e.g. Ref. 0|) 
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and Cg is the corresponding matrix of estimators. Both 
Cg and Cg include instrumental noise variance. The frac- 
tional sky covered is taken to be 0.8 for all £, and we use 
simulated data out to an £ max of 2000. 

We simulate data for a range of values of ns ■ In defin- 
ing the fiducial models for which the data are simulated, 
the other parameters are kept fixed, those parameters be- 
ing the cold dark matter density f2 c dmj the baryon den- 
sity f2Bj the optical depth r, the angular size of the sound 



1 Trotta 3j also used Planck simulations, but did not disclose how 
they were implemented. 

2 www.rssd.esa.int/index.php?project=PLANCK&page=perf_top 



horizon at decoupling O and the power spectrum ampli- 
tude As- The specific values chosen were = 0.024, 
fl c h 2 = 0.103, 6 = 1.047, r = 0.14 and A s = 2.3 x 10~ 9 
respectively, where h is the Hubble parameter in the 
usual units and is equal to 0.78 for these parameter 
choices. These values were motivated by the WMAP3 
results 10]. All parameters are varied in computing evi- 
dences. 



C. Results 

Having simulated Planck data for a given ns, we com- 
pute the evidences of the two models, which we denote by 
HZ and VARYn. The former is of course the Harrison- 
Zel'dovich model with ns fixed to one. The latter is a 
model with ns allowed to vary in fits to the data. In 
each case, all the other parameters are allowed to vary, 
each with the same prior range as used in Ref. Q. This 
is repeated for different values of fiducial ns- 

As in Ref. [4j, the prior range for ns is taken to be 
0.8 < ns < 1.2, representing a reasonable range allowed 
by slow- roll inflation models (see e.g. Ref. []}). The end 
result does have some prior dependence. If the prior is 
widened in regions where the likelihood is negligible, then 
the evidence just changes proportional to the prior vol- 
ume, so for instance a doubling of the prior range will 
only reduce the In(evidence) by In 2 = 0.69. This indi- 
cates that the prior range is not very important for this 
parameter. 

We use the CosmoNest algorithm described in Refs. 0, 
P ] to compute the evidences. This is based on the nested 
sampling algorithm of Skilling ^j, an d is a fast Monte 
Carlo (but not Markov chain) method for accurately av- 
eraging the likelihood across the entire prior space. The 
algorithm parameters used were N = 300 live points and 
an enlargement factor of 1.8 for HZ and 1.9 for VARYn. 
These enlargement factors are higher than those required 
for the same models and similar target accuracy with say 
WMAP data. This is because as the data improve the 
likelihood contours in the high likelihood regions can de- 
viate from elliptical and become more banana shaped. 
The tolerance parameter was set to 0.5 which gave an- 
swers to good accuracy as indicated by the uncertainties 
obtained. Four independent evidence evaluations were 
done for each calculation, to obtain the mean and its 
standard error. 

Figure ^ shows our main result. At ns = 1, the HZ 
model is strongly preferred with Initio = —3.6 ±0.1. It 
has a higher evidence since it can fit the data just as well 
as VARYn and has one less parameter. Once ns is far 
enough away from 1, the HZ fit becomes very poor and 
the Bayes factor plummets. The speed with which this 
happens indicates the strength of the experiment. 

We see that if the true value lies in the range 0.989 < 
ns < 1.011, Bayesian model selection will favour the HZ 
model, and within the narrower range 0.994 < ns < 1.006 
it will give strong support to that model, though Planck 



s 




n 



FIG. 1: The (negative of the) logarithm of the Bayes factor, 
— In Bio, as a function of the fiducial value of ns, where Mo 
is the HZ model and Mi is VARYn. The horizontal lines 
indicate where the comparison becomes 'strong' (dashed) and 
'decisive' (solid) on the Jeffreys' scale. 



on its own is not powerful enough to be able to decisively 
favour HZ over VARYn even if HZ is the true case. Only 
once ns < 0.986 or ns > 1.014 can Planck offer strong 
evidence against HZ, rapidly becoming decisive as the 
fiducial value moves away from unity beyond 0.983 or 
1.017. 

We can contrast these model selection results with 
those indicated by parameter estimation. Using the same 
simulated data, we compute the marginalized likelihood 
of ns about ns = 1. This gives a 68% range of 0.995 < 
n s < 1.004 and a 95% range of 0.991 < n s < 1.008, 
in good agreement with estimates obtained by other au- 
thors including the Planck Blue Book. We see this is an 
explicit example of Lindley's paradox; there are values 
of ns lying outside the 95 percent confidence region, for 
which model selection would nevertheless favour the HZ 
model. 

We end by estimating how likely it is that Planck will 
be able to make a decisive selection between our two mod- 
els, based on current understanding of the spectral index. 
We use a variant of Trotta's ExPO approach but 
with one important distinction; that we use the current 
model selection position as input, whereas Trotta used 
the observed likelihood in the VARYn model alone. For 
simplicity we consider only the marginalized likelihood 
for ns as the starting point rather than marginalizing 
the model selection outcome over all parameters, but we 
expect that to make little difference in this case. 

According to Parkinson et al. ^l|, following WMAP3 
the balance of probability between HZ and VARYn is 
12% to 88% (with some dependence on the choice of data 
compilation) . This makes the important assumption that 
the models were thought equally likely before the data 
came along; anyone who thinks otherwise can readily re- 
compute according to their own prejudice. In essence, 
one can think of the probability distribution for ns as 
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being a weighted superposition of the likelihood in the 
VARYn model plus a delta- function at ns = 1- Trotta 
omits the delta-function term in his ExPO forecasts. 

For the 12% probability that ns is actually one, Planck 
will clearly not find evidence to the contrary, but as we 
have seen would in that case provide strong evidence for 
the HZ case. For the remaining probability, we use the 
marginalized distribution for ns as computed in Ref . . 
We find that 7% of the posterior lies in the region ns > 
0.983 where even Planck cannot make a decisive verdict. 
We can therefore conclude that if ns is not one, then 
Planck is expected to provide a decisive verdict against 
HZ, which WMAP3 has not achieved, but with a small 
chance it will not. 

Trotta |3J came to the same verdict that Planck is very 
likely to rule out ns = 1, but using WMAP1 data. How- 
ever we would not have come to that conclusion using 
our modification of his approach, as with WMAP1 the 
model selection verdict put somewhat more than half 
the probability in the HZ case Q, and also a significant 
part of the ns ^ 1 probability into the indecisive region. 
Accordingly, at that point we would have said that the 
most likely outcome of Planck (under the assumption of 
equal model prior probabilities) was strong support for 
ns = 1- However, as is always the danger with probabil- 
ities, WMAP3 has overturned that conclusion. 



III. CONCLUSIONS 

We have carried out a model selection forecast for the 
Planck satellite, focussing on the scalar spectral index. 
Such analyses complement the usual parameter error 
forecasts, and are particularly directed to the question 
of when one can robustly identify the need for new fit 
parameters. In particular, we have delineated the values 
of ns for which strong or decisive model comparisons can 
be carried out. Ruling out ns = 1 is found to be signifi- 
cantly harder than parameter error forecasts suggest. 

The recent WMAP3 data have left ns poised in an 
interesting position, where model selection analyses do 
support parameter estimation conclusions but not yet at 
a decisive level. Our results show that if ns really is 
different from one, then Planck is very likely to be able 
to confirm that, but if the HZ case is the true one then 
even Planck will not be decisive. 
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